Goto

Collaborating Authors

 symbolic regression


Appendix A Details

Neural Information Processing Systems

More details on each of these datasets are given below. This data is referred to as "in-domain" because the validation data is generated using the same As for cache hits, they are also not counted as visits. Figure 9: MCTS-Guided decoding algorithm for Symbolic Regression with the pre-trained transformer model used for expansion and evaluation steps. MCTS algorithm (Figure 1) which can be used in a similar fashion but without sharing information with the pre-trained transformer. The approach involves fine-tuning an actor-critic-like model to adjust the pre-trained model on a group of symbolic regression instances.









Transformer-based Planning for Symbolic Regression

Neural Information Processing Systems

Symbolic regression (SR) is a challenging task in machine learning that involves finding a mathematical expression for a function based on its values. Recent advancements in SR have demonstrated the effectiveness of pre-trained transformer models in generating equations as sequences, leveraging large-scale pre-training on synthetic datasets and offering notable advantages in terms of inference time over classical Genetic Programming (GP) methods. However, these models primarily rely on supervised pre-training objectives borrowed from text generation and overlook equation discovery goals like accuracy and complexity. To address this, we propose TPSR, a Transformer-based Planning strategy for Symbolic Regression that incorporates Monte Carlo Tree Search planning algorithm into the transformer decoding process. Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable equation verification feedback, such as fitting accuracy and complexity, as external sources of knowledge into the transformer equation generation process. Extensive experiments on various datasets show that our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, extrapolation abilities, and robustness to noise.


Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding

Neural Information Processing Systems

Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search (e.g. using reinforcement learning) and genetic programming. In this work, we introduce a hybrid neural-guided/genetic programming approach to symbolic regression and other combinatorial optimization problems. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component, gradually learning better starting populations. On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. We demonstrate that running many genetic programming generations without interdependence on the neural-guided component performs better for symbolic regression than alternative formulations where the two are more strongly coupled. Finally, we introduce a new set of 22 symbolic regression benchmark problems with increased difficulty over existing benchmarks.